43 research outputs found
Multi-view Metric Learning in Vector-valued Kernel Spaces
We consider the problem of metric learning for multi-view data and present a
novel method for learning within-view as well as between-view metrics in
vector-valued kernel spaces, as a way to capture multi-modal structure of the
data. We formulate two convex optimization problems to jointly learn the metric
and the classifier or regressor in kernel feature spaces. An iterative
three-step multi-view metric learning algorithm is derived from the
optimization problems. In order to scale the computation to large training
sets, a block-wise Nystr{\"o}m approximation of the multi-view kernel matrix is
introduced. We justify our approach theoretically and experimentally, and show
its performance on real-world datasets against relevant state-of-the-art
methods
On multi-class learning through the minimization of the confusion matrix norm
In imbalanced multi-class classification problems, the misclassification rate
as an error measure may not be a relevant choice. Several methods have been
developed where the performance measure retained richer information than the
mere misclassification rate: misclassification costs, ROC-based information,
etc. Following this idea of dealing with alternate measures of performance, we
propose to address imbalanced classification problems by using a new measure to
be optimized: the norm of the confusion matrix. Indeed, recent results show
that using the norm of the confusion matrix as an error measure can be quite
interesting due to the fine-grain informations contained in the matrix,
especially in the case of imbalanced classes. Our first contribution then
consists in showing that optimizing criterion based on the confusion matrix
gives rise to a common background for cost-sensitive methods aimed at dealing
with imbalanced classes learning problems. As our second contribution, we
propose an extension of a recent multi-class boosting method --- namely
AdaBoost.MM --- to the imbalanced class problem, by greedily minimizing the
empirical norm of the confusion matrix. A theoretical analysis of the
properties of the proposed method is presented, while experimental results
illustrate the behavior of the algorithm and show the relevancy of the approach
compared to other methods
Design and Implementation of a Type System for a Knowledge Representation System
A knowledge representation system ({\sc krs}) is made up of both a language to represent knowledge of a domain and well-defined reasoning facilities to infer new knowledge from known facts. This paper deals with {\sc krs}s close to frame-based systems, that include \Dstems and object-based systems. In these systems, the main relation that leads to inferences is {\em §B{}}. Knowledge terms are described through roles which refer to either other \K{} terms or data types. Subsumption between term \Ds usually interpreted as data set inclusion, where data is either a \K{} term or an external term (integer, string, etc.). Although §B{} between \K{} terms is well-defined, its \IP{} on external data depends upon the host language since there are actually the data types of the {\sc krs}. As a consequence, no {\sc krs} is able to integrate a new data type ({\em e.g.} Matrix) such that its values can be safely involved in §B{} and further inferences. This is the problem addressed in this paper. The proposed solution is the design of a polymorphic \TS{} connected to both the {\sc krs} and the host language. It is designed so that it can extend the {\sc krs} with any data type \IP{} available in the host language (library, user-coded). Meanwhile, the values of the new data type get safely involved in the {\sc krs} reasoning processes. The presented \TS{} avoids the incompleteness of §B{} due to its incomplete processing on external data
A Protocol to Detect Local Affinities Involved in Proteins Distant Interactions
The tridimensional structure of a protein is constrained or stabilized by some local interactions between distant residues of the protein, such as disulfide bonds, electrostatic interactions, hydrogen links, Wan Der Waals forces, etc. The correct prediction of such contacts should be an important step towards the whole challenge of tridimensional structure prediction. The in silico prediction of the disulfide connectivity has been widely studied: most results were based on few amino-acids around bonded and non-bonded cysteines, which we call local environments of bonded residues. In order to evaluate the impact of such local information onto residue pairing, we propose a machine learning based protocol, independent from the type of contact, to detect affinities between local environments which would contribute to residues pairing. This protocol requires that learning methods are able to learn from examples corrupted by class-conditional classification noise. To this end, we propose an adapted version of the perceptron algorithm. Finally, we experiment our protocol with this algorithm on proteins that feature disulfide or salt bridges. The results show that local environments contribute to the formation of salt bridges. As a by-product, these results prove the relevance of our protocol. However, results on disulfide bridges are not significantly positive. There can be two explanations: the class of linear functions used by the perceptron algorithm is not enough expressive to detect this information, or cysteines local environments do not contribute significantly to residues pairing
Cross-view kernel transfer
We consider the kernel completion problem with the presence of multiple views
in the data. In this context the data samples can be fully missing in some
views, creating missing columns and rows to the kernel matrices that are
calculated individually for each view. We propose to solve the problem of
completing the kernel matrices with Cross-View Kernel Transfer (CVKT)
procedure, in which the features of the other views are transformed to
represent the view under consideration. The transformations are learned with
kernel alignment to the known part of the kernel matrix, allowing for finding
generalizable structures in the kernel matrix under completion. Its missing
values can then be predicted with the data available in other views. We
illustrate the benefits of our approach with simulated data, multivariate
digits dataset and multi-view dataset on gesture classification, as well as
with real biological datasets from studies of pattern formation in early
\textit{Drosophila melanogaster} embryogenesis
Apports de la modélisation algébrique pour la représentation de connaissances par objets : illustration en AROM
National audienceAROM est un système de représentation de connaissances reposant, à l'image des diagrammes de classes d'UML, sur deux types d'entités de modélisation complémentaires : les classes et les associations. Il intègre un langage de modélisation algébrique (ou LMA) qui sert de support à différents mécanismes d'inférence. Ce langage permet l'écriture d'équations, de contraintes, et de requêtes, impliquant les instances des classes et des associations. La présence d'un module de types en AROM permet d'étendre l'ensemble des types (donc des valeurs et des opérateurs) supportés par le LMA. A travers la description du LMA d'AROM, cet article souligne l'apport d'un langage de modélisation algébrique pour un système de représentation de connaissances tant au niveau de la déclarativité qu'en termes des inférences possibles
Objects, types and constraints as classification schemes (abstract)
capponi1995aInternational audienceThe notion of classification scheme is a generic model that encompasses the kind of classification performed in many knowledge representation formalisms. Classification schemes abstract from the structure of individuals and consider only a sub-categorization relationship. The product of classification schemes preserves the status of classification scheme and provides various classification algorithms which rely on the classification defined for each member of the product. Object-based representation formalisms often use heterogeneous ways of representing knowledge. In the particular case of the TROPES system, knowledge is expressed by classes, types and constraints. Here is presented the way to express types and constraints in a type description module which provides them with the simple structure of classification schemes. This mapping allows the integration into TROPES of new types and constraints together with their sub-typing relation. Afterwards, taxonomies of classes are themselves considered to be classification schemes which are products of more primitive ones. Then, this information is sufficient for classifying TROPES objects
The Multi-Task Learning View of Multimodal Data
International audienceWe study the problem of learning from multiple views using kernel methods in a supervised setting. We approach this problem from a multi-task learning point of view and illustrate how to capture the interesting multimodal structure of the data using multi-task kernels. Our analysis shows that the multi-task perspective offers the flexibility to design more efficient multiple-source learning algorithms, and hence the ability to exploit multiple descriptions of the data. In particular, we formulate the multimodal learning framework using vector-valued reproducing kernel Hilbert spaces, and we derive specific multi-task kernels that can operate over multiple modalities. Finally, we analyze the vector-valued regularized least squares algorithm in this context, and demonstrate its potential in a series of experiments with a real-world multimodal data set
Identification et Exploitation des Types dans un modèle de connaissances à objets
Les modèles de connaissances à objets (MCO) souffrent d'une surcharge dans l'utilisation de leur langage de représentation associé. Si ce langage a pour objectif d'être adapté à la représentation informatique d'un domaine d'application, nous montrons qu'il n'est pas pertinent de l'utiliser pour définir des structures de données, certes utiles pour la représentation du domaine, mais dépourvues de signification directe dans ce domaine (ex. une matrice dans le domaine de l'astronomie). Cette thèse propose un système de types à deux niveaux, appelé METÈO. Le premier niveau de METÈO est un langage pour l'implémentation de types abstraits de données (ADT) qui sont nécessaires à la description minimale des éléments pertinents du domaine d'application. Ainsi, METÈO libère le langage de représentation d'une tâche à laquelle il n'a pas à s'adapter. Le second niveau de METÈO traite de l'affinement des ADT opéré dans la description des objets de représentation. Nous rappelons les deux interprétations des objets de représentation: l'intension d'un objet est une tentative de description de ce que cet objet dénote dans le domaine d'application: son extension. L'équivalence généralement admise entre ces deux aspects de l'objet est une illusion, et contribue de plus à annihiler une des véritables finalités d'un modèle de connaissances: aider une caractérisation des plus précises d'un domaine d'application. Ainsi, les types du second niveau de METÈO s'attachent à la représentation et la manipulation des intensions des objets, indépendamment de leurs extensions. L'interprétation en extension des objets est effectuée par l'utilisateur, METÈO gère en interne les descriptions de ces objets alors dénuées de leur signification, et le MCO peut alors se concentrer sur la coopération entre ces deux aspects des objets, considérés non-équivalents dans cette étude. METÈO contribue ainsi à clarifier le rôle de chaque partenaire impliqué dans la construction et l'exploitation d'une base de connaissances. Plus généralement, METÈO jette un pont entre les spécificités des MCO et les techniques usuelles de programmation de structures de données manipulables. Un prototype de METÈO a été développé pour un couplage avec le MCO TROPE